RNA sequencing identifies lung cancer lineage and facilitates drug repositioning

Recent breakthrough therapies have improved survival rates in non-small cell lung cancer (NSCLC), but a paradigm for prospective confirmation is still lacking. Patientdatasets were mainly downloaded from TCGA, CPTAC and GEO. We conducted downstream analysis by collecting metagenes and generated 42-gene subtype classifiers to elucidate biological pathways. Subsequently, scRNA, eRNA, methylation, mutation, and copy number variation were depicted from a phenotype perspective. Enhancing the clinical translatability of molecular subtypes, preclinical models including CMAP, CCLE, and GDSC were utilized for drug repositioning. Importantly, we verified the presence of previously described three phenotypes including bronchioid, neuroendocrine, and squamoid. Poor prognosis was seen in squamoid and neuroendocrine clusters for treatment-naive and immunotherapy populations. The neuroendocrine cluster was dominated by STK11 mutations and 14q13.3 amplifications, whose related methylated loci are predictive of immunotherapy. And the greatest therapeutic potential lies in the bronchioid cluster. We further estimated the relative cell abundance of the tumor microenvironment (TME), specific cell types could be reflected among three clusters. Meanwhile, the higher portion of immune cell infiltration belonged to bronchioid and squamoid, not the neuroendocrine cluster. In drug repositioning, MEK inhibitors resisted bronchioid but were squamoid-sensitive. To conceptually validate compounds/targets, we employed RNA-seq and CCK-8/western blot assays. Our results indicated that dinaciclib and alvocidib exhibited similar activity and sensitivity in the neuroendocrine cluster. Also, a lineage factor named KLF5 recognized by inferred transcriptional factors activity could be suppressed by verteporfin.


Introduction
It is widely generally accepted that the development of advanced non-small cell 46 lung cancer (NSCLC) is based on genetic mutations, some of which are driver 47 mutations. Tyrosine kinase inhibitor therapy is used to target mutated oncogenes. 1,2 48 However, a large proportion of the population lacks a targetable driver and is referred 49 to as "driver-negative". 2 In recent years, anti-programmed death 1 (PD-1/PD-L1) 50 immune checkpoint inhibitor (ICI) therapies become the keystone of first-line therapy 51 for driver-negative NSCLC. 2 Preclinical evidence suggests that the addition of 52 anti-PD-1/PD-L1 therapies improved prognosis in patients treated with combined 53 pemetrexed and platinum and showed great potential value in nonsquamous NSCLC 54 treatment. 2,3 The TME suitable for ICI treatments is defined as "hot-immune" with 55 higher cytotoxic T cell infiltration and tumor mutation burden (TMB). 4 56 TCGA revealed the existence of three transcriptional subtypes in LUAD, 57 including bronchioid/terminal respiratory unit (TRU), magnoid/proximal-proliferative 58 (PP), and squamoid/proximal-inflammatory (PI). 5 TCGA group proved TRU subtype 59 was associated with a better prognosis and was enriched for EGFR mutations. The 60 loss function of KRAS and STK11 occurred frequently in the PP subtype. While the 61 PI subtype was described as a co-occurrence of TP53 and NF1. Importantly, PP and 62 PI patients have a poorer prognosis compared to TRU, which may be due to their 63 highly proliferative characteristics. 5,6 Ringnér et al. reported a technical bias in the 64 TCGA subtype and that metagenes may be a valid improvement. 6 16 Note that hierarchical clustering (HC) was used as orthogonal verification. 94 For HC, median normalized expression and complete distance were adopted. A high 95 confidence queue is considered to have an overall accuracy of over 0.8 and Kappa 96 over 0.6 after cross-validation. Additionally, one thousand highly variable gene 97 expression matrix was used as input for principal component analysis using the R 98 packages "factoextra" and "FactoMineR" to assess the dissimilarity of the clusters. 99 Using the limma package of R, differential expression gene (DEG) analysis was 100 performed. 17 DEG was defined as log fold-change > 0.7. To characterize the diversity 101 of clusters, we selected metagenes from previous publications (Supplementary Table   102 3). 3 198 We assume that the smallest subset will benefit from targeted therapies. According to 199 our observations, the most pronounced neuroendocrine profile was in the PP-3 200 subtype, which may reflect the original PP subtype (Supplementary after overall consideration ( Fig. S1A-B). The 42-gene classified patients as 205 bronchioid, neuroendocrine and squamoid clusters (Fig. S1C). As expected, the PP-3 206 subtype is a subset of our neuroendocrine cluster (Fig. S2A). 39 It is interesting to note 207 that the mesenchymal subtype is also a subset of the bronchioid cluster (Fig. S2B) Table 6). 214 Furthermore, we first explore prognostic value among three clusters, the shortest 215 OS was in the squamous cluster ( Fig. 1A-B  variable gene expression data (see "methods") ( Fig. 1A-B)  cluster possessed potential efficiency with pemetrexed monotherapy (Fig. 1C-D). 236 We selected four lung-specific metagenes to represent clusters described in 237   Supplementary Table 3. 3,6,18 Using the R package GSVA, we found that 238 neuroendocrine and squamous shared proliferation among four cohorts ( Fig. 2A) the immune profiles in transdifferentiation-related clusters. Among three cohorts, the 244 least immune infiltrate was in the neuroendocrine cluster ( Fig. 2A). The bronchioid 245 cluster had the highest proportion of resting mast cells, but the lowest proportion of 246 activated memory CD4 + T cells using CIBERSORT algorithm (Fig. 2B). 23 After 247 considering high-resolution dataset, we found squamous cluster was most related to 248 T-cell status but had the least CellPhoneDB inferred cellular communication using 249 Scissor method ( Fig. S3A-C, Supplementary Table 8). 43 (Fig. 2D). 47 Furthermore, we focused on four major mutations (EGFR, 257 KRAS, STK11, and TP53) in CPTAC-LUAD, GSE72094 and TCGA-LUAD cohorts 258 (Fig. 2E). Importantly, STK11 mutations were mainly distributed in the 259 neuroendocrine cluster. The distribution of mutations among clusters is very different 260 across three cohorts (e.g., KRAS and EGFR), which may be due to ethnicity and 261 smoking history. In conclusion, the neuroendocrine cluster is characterized as an 262 13 immune cold partially because of STK11 mutations, which show resistance to ICI 263 therapies ( Fig. 2A, 2E). 1,2 Then, mutations with prognostic significance in clusters 264 were paid attention to ZNF536, PXDNL, ADGRB3, and ADGRL3 mutations were 265 almost greater than 10% in frequency by in the cBioportal website (Fig. S4A). In the 266 bronchioid cluster, the mutant type of ZNF536 and PXDNL showed longer OS than 267 the wild type, the other clusters were the opposite (Fig. 2F) mutations showed a better prognostic trend in both cohorts (Fig. S4B). Meanwhile, 272 ADGRB3 mutations are related to a favorable prognosis in Hellmann's nonsquamous 273 NSCLC cohort receiving PD-1 plus CTLA-4 blockade (Fig. S4C)

278
Considering that low expression of major histocompatibility complex II (MHC-II) 279 is a distinctive feature of the neuroendocrine cluster, we gain further insight into the 280 underlying mechanisms (Fig. 2A). A previous study suggested the expression of 281 MHC-II was mainly regulated by enhancers, the bronchial, squamous and 282 neuroendocrine clusters from highest to lowest according to the level of MHC-II 283 enhancers (Fig. 3A). 21 We also found a similar trend in global-methylation level ( Fig.   284 14 3B). Importantly, there was a high correlation coefficient (R = 0.47, Fig. 3C was found in the neuroendocrine cluster (Fig. 3D). Interestingly, the opposite result 293 occurred between the clusters when using immuno-specific eRNAs (Fig. 3E). In   Table 9). 48 We further analyzed the CCLE and GDSC databases, 305 and for GDSC only the top 100 compounds with high lethality were included 306 15 (Supplementary Table 9). Through larger-scale drug-sensitive datasets predicted by 307 R package oncoPredict, our results show that MEK inhibitors may favor squamoid 308 cluster based on CCLE and GDSC datasets (Fig. S5B-C). 37 neuroendocrine cluster (Fig. 4A). 5,40,49 As expected, alvocidib could inhibit the 314 proliferation of NCI-H1944, which was identified as a neuroendocrine cluster (Fig.   315 4B). Given that the second-generation pan-CDK inhibitor dinaciclib may be superior 316 to than the first-generation pan-CDK inhibitor alvocidib, we further validated the 317 proliferative effect of dinaciclib in tumor and normal epithelial cell lines (Fig. 4C). 318 Our results suggest that dinaciclib is promising in the neuroendocrine cluster and that 319 we should be aware of concentration. 320 A recent study proved drug sensitivity could be inferred through expression 321 profiles. 38 We asked whether one nonlinear regression model named random forest 322 could discover drugs, with similar functional mechanisms. Interestingly, PLK1 323 inhibitors and EGFR inhibitors show a high degree of similarity (about 20%-50%). In 324 addition, both dinaciclib and alvocidib reduced interferon and metabolic processes, 325 and increase extracellular matrix and squamous remodeling (Fig. 4D). Taken  was refined, and further, dissect the pre-existing TME specifically into effector and 334 suppressor cells. 18 We obtained robust immune gene sets and proved that immune 335 cold means complete resistance to PD-L1 blockade, although immune hot does not 336 guarantee benefits. Together with our results, the abundance of Treg cells is also a 337 potential indicator of anti-PD-1/PD-L1 therapies (Fig. 5A). 4

338
We assume that lung lineage development maintains immune balance. Two 339 networks were constructed based on scRNA and epithelial bulk sequencing, using the 340 immune and epithelial gene sets, respectively (Fig. 5B-C). 15 349 We first assessed common clinical indexes related to immunotherapy including TMB, 350 CNVs, and tumor purity. 1,2 Among the three clusters, the neuroendocrine cluster had 351 highest TMB, CNVs and tumor purity (Fig. S6A-B). Both tumor purity and 352 cross-platform influenced the robustness of clusters. We believe that the single sample 353 predictor algorithm generated by our clusters is promising for testing personalized 354 therapies through NanoString nCounter technology. 52

355
Our cluster not only reflected local perturbation-related immunity but also 356 complemented the existing immunophenotypes. 53 We observed the proportion of 357 immunophenotypes finding large inconsistencies in different datasets (e.g. TGF beta 358 dominant). Nonetheless, the relative distribution between clusters and 359 immunophenotypes was evident, as reflected in the bronchioid cluster, mainly in 360 lymphocyte depleted (Fig. S7) (Fig. 6).

370
In this study, we established three clusters across samples, PDXs, and cell lines. 371 Prognosis, genomics, genetics, and TME were depicted among bronchial, 372 neuroendocrine and squamoid clusters. We proved that immune pathways did 373 recognize lineage factors, and focused on the analysis of drug sensitivity differences 374 in three lung cancer lineages. 375 Our results confirmed that classical pathways can profile molecular of STK11 mutations and NKX2-1 amplifications, but the lowest immune infiltration. 385 It is confirmed that bronchioid is immune activated, which may be due to EGFR 386 mutations neglected in the past. 1 Fig. S8A-B)